Greedy Layer-Wise Training of Deep Networks
نویسندگان
چکیده
Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.
منابع مشابه
A Pipelined Pre-training Algorithm for DBNs
Deep networks have been widely used in many domains in recent years. However, the pre-training of deep networks is time consuming with greedy layer-wise algorithm, and the scalability of this algorithm is greatly restricted by its inherently sequential nature where only one hidden layer can be trained at one time. In order to speed up the training of deep networks, this paper mainly focuses on ...
متن کاملFaster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training
Deep neural networks are capable of modelling highly nonlinear functions by capturing different levels of abstraction of data hierarchically. While training deep networks, first the system is initialized near a good optimum by greedy layer-wise unsupervised pre-training. However, with burgeoning data and increasing dimensions of the architecture, the time complexity of this approach becomes eno...
متن کاملStudies in Deep Belief Networks
Deep networks are able to learn good representations of unlabelled data via a greedy layer-wise approach to training. One challenge arises in choosing the layer types to use, whether it is an autoencoder, restricted boltzmann machine, with and without sparsity regularization. The layer choice directly affects the type of representations learned. In this paper, we examine sparse autoencoders and...
متن کاملExploring Strategies for Training Deep Neural Networks
Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions. Hinton et al. recently proposed a greedy layer-wise u...
متن کاملDeep Belief Networks
The important aspect of this layer-wise training procedure is that, provided the number of features per layer does not decrease, [6] showed that each extra layer increases a variational lower bound on the log probability of data. So layer-by-layer training can be repeated several times1 to learn a deep, hierarchical model in which each layer of features captures strong high-order correlations b...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006